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I .  INTRODUCTION 

A.  Background 

Factor  analysis  is  a  frequently  used  model  building  technique, 
especially  in  sciences  where  a  large  number  of  variables  need  to  be 
studied.  Unfortunately,  little  work  has  been  done  on  ways  of  testing  the 
goodness  of  fit  of  the  model  to  the  data.  Several  techniques  for  testing 
this  goodness  of  fit  have  been  evaluated  in  this  investigation.  In 
addition,  these  techniques  were  used  to  evaluate  the  maximum  likelihood 
estimation  procedure  as  a  factor  analytic  method. 

For  many  years,  factor  analysis  has  been  used  as  a  research  tool  for 
finding  the  major  or  meaningful  influences  on  a  set  of  variables.  Many 
different  methods  for  finding  these  influences  (or  factors)  have  been 
proposed  and  used  (Harman,  1967),  but  most  are  strictly  non-inferential 
in  nature.  That  is,  they  treat  the  observed  correlations  as  though  they 
are  population  values,  and  the  resulting  factor  loadings  are  calculated 
directly  from  the  sample  correlations.  Thus,  the  statistical  problems 
of  the  sampling  of  individuals  are  ignored,  and  results  are  usually 
considered  as  though  they  are  population  values.  On  the  other  hand,  the 
method  of  maximum-likelihood  estimation  prov.'  statistical  procedure 

for  the  estimation  of  the  parameters  of  the  factor  analytic  model,  based 
on  the  assumption  that  the  original  variables  follow  a  multivariate 
normal  distribution. 

The  basic  assumption  of  factor  analysis  is  that 
x  »  Af  +  u  +  e 

where  x  is  a  random  vector  of  p  variables,  f  is  a  random  vector  of  r,  <  p 
common  factor  scores,  e  is  a  random  vector  of  p  uniquenesses,  A  is  u  p  x  ir. 
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matrix  of  factor  loadings,  and  p  is  a  p  vector  of  means.  It  is  further 
assumed  that  E(f)  =  0,  E(e)  =  0,  E(ff')  =  I,  E(ee')  =  D2,  and  E(fe')  =  0. 
This  model  is  usually  written 

P  =  AA'  +  D2 

where  P  is  the  correlation  matrix.  Here,  x  is  assumed  to  be  multivariate 
normal  and  the  rows  of  A  are  proportional  to  the  rows  of  A  as  Lavley  ( 19^*0) 
has  shown  that  the  results  are  independent  of  the  scale  of  measurement. 

B.  Algebraic  Methods  of  Factor  Analysis 

The  principal  axis  method  of  factor  analysis,  developed  by  Hotelling 
(1933),  is  today  probably  the  most  commonly  used  factor  analytic  technique. 
Its  basic  objective  is  to  determine  factors  which  account  for  the  maximum 
amount  of  variance  of  the  observed  variables.  The  first  factor  is  that 
lineai*  combination  of  the  original  variables  which  accounts  for  maximum 
variance.  Each  subsequent  factor  accounts  for  a  maximum  amount  of  the 
remaining  variance,  while  remaining  unco. ■'related  with  all  previous  factors. 
Thus,  the  factors  are  derived  by  an  algebraic  rule  from  the  sample 
correlations,  and  can  be  strongly  influenced  by  sampling  variability. 

The  factors  are  determined  by  the  matrix  equation  A  =  V  A* where  V 
contains  the  normalised  eigenvectors  of  the  sample  correlation  matrix  as 
columns,  and  A  is  a  diagonal  matrix  containing  the  eigenvalues,  i.e., 

R  =  V  A  V  • . 

Minreo  (minimum  residual)  factor  analysis  (Harman  and  denes,  1966) 
i3  another  algebraic  approach  to  the  problem  of  obtaining  factors.  Its 
aim  is  the  best  possible  reproduction  of  the  observed  correlations,  where 
best  is  defined  in  terms  of  a  least  squares  fit.  Thus  the  factor  matrix  A 
is  determined  such  that  the  sum  of  squares  cf  the  off  diagonal  elements 


of  R  -  AA'  is  minimized . 
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C .  Maximum  Likelihood.  Factor  Analysis 

In  contrast  to  the  algebraic  techniques  mentioned  previously,  the 
method  of  maximum-likelihood  estimation  requires  the  writing  down  of 
the  density  function  of  the  observations,  given  the  population  parameters. 
Then  the  sample  values  are  considered  as  fixed,  and  the  parameters  are 
considered  as  the  variables.  Th^  resulting  function,  called  the  likelihood 
function,  is  then  maximized  with  respect  to  the  parameters.  The  values 
of  the  parameters  which  maximize  the  likelihood  function  are  termed  the 
maximum-likelihood  estimates  of  the  population  parameters  (Lehmann,  1959). 

The  possibility  of  the  use  of  the  maximum-likelihood  method  for  the 
estimation  of  factor  loadings  has  existed  for  at  least  thirty  years 
(Lawley,  1Q^0).  The  method  has  always  required  an  iterative  procedure 
with  a  large  number  of  calculations  to  be  performed  on  each  iteration. 

Thus  it  seemed  natural  that  the  development  of  computers  would  encourage 
the  use  of  the  method,  but  Lavloy  and  Maxwell  ( 1 96 3 )  reported  that  in  some 
cases  convergence  of  the  likelihood  function  to  its  maximum  was  a  very 
slow  or  might  not  even  be  attained,  unless  good  initial  estimates 

for  the  factor  loadings  wore  used. 

doreiskog  (196?  a  and  b)  has  developed  a  new  computational  method  which 
has  the  advantage  that  the  iterative  procedure  always  converges.  In  other 
papers,  Joreakog  (1969,  19T0)  has  extended  the  max iaum-l ikeii hood  estimation 
method  to  cover  a  vide  variety  of  models,  including  factor  analytic  ones. 

In  conjunction  with  these  efforts,  Joreskog,  Cruvacus,  and  van  Thillo  (1970) 
have  developed  a  general  computer  program  to  calculate  maximum- likelihood 
estimates.  The  mximus-3 ikelihood  estimation  procedure  also  provides  a 
likelihood -ratio  test  of  the  number  of  factors,  and  this  test  has  been  made 
available  in  their  general  computer  program. 


k 

t 

In  applying  the  method  of  maximum  likelihood  to  the  general  factor 
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analytic  model  (P  =  AA'  +  D  ),  one  writes  the  log  of  the  likelihood  function, 
omitting  a  function  solely  of  the  observations  (Lavley  and  Maxwell,  1963) 

loge  L  =  -|  [loge  |P|  +  tr (RP-1) ] 

where  R  is  a  sample  correlation  matrix  based  on  a  sample  of  size  n  +  1. 

This  expression  is  then  maximized  with  respect  to  the  elements  of  the 
matrices  A  and  D,  to  obtain  a  maximum-likelihood  solution. 

Since  the  method  of  maximum-likelihood  estimation  is  an  established 
statistical  procedure,  it  is  desirable  to  see  how  well  it  does  in  practice. 
Browne  (1968)  has  already  shown  that  maximum-likelihood  estimates  are 
preferable  to  many  other  types  when  dealing  with  sample  correlation 
matrices  drawn  from  populations  which  exactly  satisfy  the  factor  analytic 
model,  but  it  remains  to  be  seen  how  well  it  will  work  with  data  more 
like  real  data. 


D.  The  Tucker,  Koopman,  and  Linn  Study 

Tucker,  Koopman,  and  Linn  (1969)  generated  5b  population  correlation 
matrices  in  order  to  study  factor  analytic  methods.  One  of  their  simulated 
correlation  matrices  was  defined  by 

R  -  B1P1B1  ♦  B2P2B2  ♦  B3P3B3 


where  B^,  B^,  and  B3  were  diagonal  matrices  with  real  positive  diagonal 

elements  b^,  b^ ,  and  (j  =  1,  2,...,p,  the  number  of  variables), 

2 

respectively.  Since  b^  was  the  proportion  of  variance  of  variable  J 

O 

due  to  the  major  factors,  b^  the  proportion  due  to  the  minor  factors,  and 
2 

b^  the  proportion  due  to  the  unique  factors  they  had: 

p  2  2 

b.  +  <  +  bl.  =  1. 

1J  2J  oj 
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Thus,  b^  +  d  equaled  the  coamunality  (common  variance)  of  variable  j. 
Three  different  relationships  betveen  these  coefficients  defined  the  three 
types  of  population  matrices  used  by  Tucker  et  al.  Correlation  matrices 
with  b^  =  0  and  b._^  =  (l  -  b,  ^ )  exactly  fitted  the  mathematical  factor 


3* 


1.1 


analytic  model  (common  factors  +  uniquenesses).  With  b!l .  =  0  and 
2  o 

b"  -  (1  -  b" . ) ,  the  correlation  matrices  constituted  the  simulation 
model,  as  F0  contained  the  accumulated  effect  of  180  minor  factors. 

It  was  hoped  that  these  simulation  matrices  would  approximate  real  data 
population-correlation  matrices  which  could  be  thought  of  as  arising  from 
a  few  major  and  many  minor  iniluences.  They  also  employed  a  third  model 
which  contained  influences  of  both  minor  factors  and  uniquenesses.  For 


2  2 

this  middle  model,  they  had  b_.  =  b_ .  = 

oJ 


1  -  b: 


2 

II 


All  correlation  matrices  contained  20  variables.  Each  P  matrix 


s 

(s  =  1,2,3)  was  constructed  from  the  relationship  P  =  A*  A* '  where 

s  s  s 

A*  was  obtained  by  adjusting  the  rows  of  A g  to  be  of  unit  length. 

The  A1  matrices  were  generated  by  random  processes  and  contained 

either  three  or  seven  columns,  representing  the  number  of  major  factors 

in  each  correlation  matrix.  The  A^  matrices  were  generated  by  another 

random  process  so  that  the  effect  of  P,3  was  -;s  though  there  were  180  minor 

factors  in  it.  P_  was  an  identity  matrix,  as  the  factor  analytic  model 
j 

assumes  a  unique  factor  for  each  variable,  and  that  these  unique  factors 
are  uncorrelated.  Tucker  et  al  also  used  three  levels  of  entries  in  the 
B-^  matrices..;  hi  (.6,  .7,  .8),  wide  (.2,  .3,  .U,  .5,  .6,  .7,  .8),  and  low 
(.2,  .3,  JO.  Thus  their  design  was  three  (models)  x  two  (number  of  major 
'"actors  )  x  three  (levels  of  coefficients),  and  they  generated  three 
correlation  matrices  for  each  of  the  eighteen  cells.  Tucker,  Koopman, 
ar.d  Linn  were  interested  in  comparing  several  factor  analytic  techniques , 
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but  the  data  they  generated  are  useful  for  studying  any  procedures 
related  to  factor  analysis. 

In  the  Tucker,  Koopmar. ,  and  Linn  (1969)  study,  the  authors  used 
a  random  process  to  generate  conceptual  input  factor  loadings  A^  for  the 
major  factor  domain.  They  combined  these  with  random  normal  deviates, 
applied  a  skewing  function,  and  multiplied  by  the  matrices  B^,  in  order 
to  get  to  actual  input  factor  loadings  h^.  (A^  =  B^A*  where  A*  A*'  =  P^). 
The  authors  used  joint  rotations  of  actual  input  factors  with  output 
factors,  and  also  rotations  of  output  factors  only,  to  assess  the  degree 
to  which  actual  input  factors  were  found  on  output.  Thus  there  were  two 
methods  of  comparison  used,  and  each  resulted  in  a  separate  index  (coeffi¬ 
cient  of  congruence)  for  each  actual  input  factor. 

Although  the  raw  data  of  the  Tucker,  Koopman,  and  Linn  study  consisted 
of  population -correlation  matrices  and  not  samples,  some  of  their  results 
can  serve  as  standards  for  some  of  the  results  of  the  eurrent  study.  In 
general,  the  reproduction  of  the  actual  input  factors  in  the  output  factors 
was  very  good  for  the  formal  model,  and  poorer  for  the  simulation  model. 

The  reproduction  was  good  with  a  high  level  of  and  poorer  for  a  low 
level.  Thirdly,  results  were  bettor  for  hreo  factor:'  than  for  seven. 
Finally,  the  combination  of  simulation  model,  low  b"  ,  and  seven  factors 

^  v* 

produced  extremely  poor  results.  These  results  led  Tucker  et  al  to  conclude 
that  the  quality  of  factor  analytic  results  depended  heavily  on  the  design 
and  conduct  of  the  study. 

E.  Goodness  of  Fit 

The  set  of  factor  analytic  methods  can  be  divided  into  two  parts: 
exploratory  methods  which  are  used  in  early  invest igations  in  an  area, 
with  the  purpose  of  reducing  a  large  number  of  variables  to  a  smaller 
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number  of  factors  when  the  investigator  has  no  a  priori  hypotheses  as  to 
the  composition  of  the  factors;  and  confirmatory  methods  which  are  used 
by  investigators  with  specifiable  hypotheses  about  the  factors.  The 
present  study  considers  confirmatory  factor  analysis  only,  and  a  major 
interest  is  in  the  discovery  or  development  of  a  measure  which  would 
reflect  the  degree  of  fit  of  the  final  solution  to  the  specified 
hypothesis.  It  is  possible  to  test  this  hypothesis  via  the  likelihood- 

ratio  technique,  and  although  the  distribution  of  the  likelihood-ratio 

'  2 

statistic  has  not  been  tabled,  it  is  distributed  approximately  as  a  x 
in  large  samples  (Lawley  and  Maxwell,  1963).  Unfortunately,  this  test 
sets  up  the  hypothesis  as  a  null  hypothesis,  and  as  the  sample  size 
increases,  it  is  more  likely  to  be  rejected,  as  no  hypothesis  is  exactly 
true.  Thus,  this  test  is  of  little  use  to  many  researchers  who  are 
interested  in  how  well  their  data  agree  with  their  model,  fully  realizing 
that  their  :r.odel  cannot  be  exactly  true  in  the  population.  Therefore  what 
is  needed  is  a  measure  to  assess  the  goodness  of  fit  of  the  model  to  the 
data.  Thus,  the  problem  in  this  study  is  different  from  the  one  considered 
by  Tucker,  Koopman,  and  Linn  as  they  were  interested  in  factor  matching,  while 
here,  the  aim  is  to  have  one  index  to  measure  the  total  goodness  of  fit. 

Tucker  (personal  communication)  has  suggested  a  measure 
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vdf  ' 

2 

(2L_  _  i) 

^df  ’ 
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where  xV  d.f.  is  the  chi-square  approximate  rest  criterion  for  the 
likelihood-ratio  test  statistic  divided  by  its  degrees  of  freedom,  taken 
after  uero  factors  and  after  m  factors  have  been  extracted.  This  measure 


Best  Available  Copy 


\ 

t 

i 

1  I 

8  ! 

is  analogous  to  a  percent  of  variance  accounted  for  by  the  model,  as  the 

2  •  \  \ 

expected  value  of  a  x  random  variable  divided  by  itd  degrees  of  freedom 
is  one.  More  recently.  Tucker  and  Lewis  (1970)  have  developed  a  second  ' 
reliability  coefficient. 


M  -  M 
0  m 
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where  n'  =  N  -  1  -  7-  (2p  +  5)  -  t  m,  p  =  number  of  variables,  M  =  F  / df  , 

m  6^3  *  m  m  i? 

i  1 

F  =  minimum  value  of  F  '(A,  D)  =  log  [ P I  +  tr(RP  )  -  log  t R I  —  P 
(Joreskog,  1967b)  for  m  factors,  and  df  =  degrees  of  freedom  for  m  factors. 
It  was  hoped  that  this  coefficient  would  be  independent  of  the  sample  size 
and  would  provide  ,an  estimate  of  the  goodness  of  fiti  of  the  factor 
analytic  model  in  the  population.  Tucker  and  Lewis  calculated  p^  for  the 
number  of  major  factors  fop  some  of  the  population-correlation  matrices 

of  Tucker,  Koopman ,  and  Linn,  These  values  (Table  l)  can  serve  as  targets 

\ 

for  the  current  study.  These  two  measures  (o,  and ■ )  are  similar  (as 

lm  2m 

can  be  seen  by  substituting  \"  =  n 1  F  in  D,  ),  but  not  identical.  It 

°  m  m  m  lm 

is  hoped  that  one  or  both  of  them  are  good  indicators  of  goodness  of  fit 
for  maximum-likelihood  factor  analysis. 
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Table  1 

Values  of  Obtained  by  Tucker  and  Lewis  from  Eight  of  the  Tucker, 
Koopman'|,  and  Linn  Population  Correlation  Matrices  (N  =  00 ,  p  =  20;. 
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Formal  Model 

1.00 


Simulation  Model 
.83 


low  b‘* 


'lj  1.00  '  .55 

7  Factors  in  Major  Domain,  Reliabilities  for  7  Common  Factor  Models 

Simulation  Model 
•  71 

.1*8 


high  b 


lj 


Formal  Model 

1.00 


1-. 


1.00 
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Another  possible  measure  of  goodness  of  fit  is  the  sum  of  squares  of 

; 

differences  between  the  correlations  implied  by  the  model  and  those 
reproduced  by  the  actual  output  factors.  Browne  (1968)  suggested  this 
measure  of  goodness  of  fit: 


P  i 

c  h  E  E 

i=l  j=l 
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where  A  is  the  sample  factor  matrix,  $  is  the  population  factor  matrix, 

and  p  is  the  number  of  variable^.  Of  course,  another  possibility  is  to 

» 

exclude  the  diagonal'  elements.  This  would  emphasize  reproduction  of  the 
correlations,  while  ignoring  the  communalities: 


P  i-1 

:  =  E  Z  [W  -  AA *  1  ^  , 

‘  i=2  J-l  iJ 


\ 
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Both  measures  were  scaled  by  the  total  sum  of  squares  in  order  to  produce 

coefficients,  ^  and  r2>  with  upper  limits  of  1.00.  In  most  cases  they 
should  vary  between  zero  and  one. 


r 


1 


=  1  - 


P  1  2 

Z  Z  [W']7, 

i-1  j=l  1J 


2  pi-1 

Z  Z  [W]2 
i=2  j=i  1J 

These  measures  (c1,  Cg,  r±,  rg)  are  all  invariant  under  orthogonal  rotation 
of  the  sample  factor  matrix  A,  and  of  the  hypothesis  factor  matrix  *. 

All  six  measures  (including  Px  and  pg)  were  obtained  for  all  96  sample 
correlation  matrices. 
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II.  METHOD 

A.  Data  Used 

Due  to  limitations  on  computer  time,  it  was  necessary  to  use  only 
some  of  the  population  correlation  matrices  from  the  Tucker,  Koopman,  and 
Linn  study.  In  order  to  preserve  the  effects  due  to  the  independent  vari¬ 
ables  used  in  generating  those  matrices,  it  was  decided  to  randomly  select 
one  matrix  from  each  of  eight  cells  in  their  design.  The  eight  cells  were 
created  by  using  two  levels  of  each  of  the  three  independent  variables  used 
by  Tucker  et  alj  i.e.  model  (formal,  vs  simulation),  level  of  (high  vs  low), 
and  numl  :r  of  factors  in  the  major  domain  (3  vs  7).  The  eight  matrices  used 
are  identified  in  ^able  2.  The  level  of  battery  (l,  2,  or  3)  was  used  by 
Tucker,  moopman,  and  Linn  to  designate  a  particular  correlation  matrix,  as 
they  had  tl.ree  such  matrices  in  each  cell  in  their  design.  In  the  current 
study,  one  battery  was  randomly  selected  from  each  of  the  eight  cells  of 
interest.  In  order  to  include  the  parameter  of  sample  size,  it  was  decided 
to  draw  samples  of  size  lOu,  400,  and  1600  from  each  population-correlation 
matrix.  To  achieve  some  stability  of  results,  four  sample  correlation 
matrices  were  drawn  from  each  population-correlation  matrix,  at  each  level 
of  sample  size,  yielding  96  sample  correlation  matrices. 


Matrix 

Level 

ofV 

Table  2 

Model 

Number  of 
Factors 

Battery 

1 

high 

formal 

3 

2 

2 

high 

formal 

7 

2 

3 

high 

simulation 

3 

3 

1+ 

high 

simulation 

7 

3  • 

5 

low 

formal 

3 

3 

6 

low 

formal 

7 

1 

7 

low 

simulation 

3 

1 

8 

low 

simulation 

7 

1 
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B.  Generation  of  Sample  Correlation  Matrices 

The  intuitive  way  to  generate  sample  correlation  matrices  is  to 
generate  samples  of  random  variables  from  a  multivariate  normal  distribu¬ 
tion  with  a  specified  correlation  matrix  (Kaiser  and  Dickman,  1962)  and 
to  calculate  the  sample  correlation  matrices  directly  from  this  raw  data. 
However,  this  method  requires  a  large  quantity  of  random  numbers  and  a 
large  amount  of  computer  time,  especially  when  large  sample  sizes  are 
required.  To  avoid  this  problem,  a  more  economical  procedure,  described 
by  Odell  and  Feiveson  (1966)  and  used  by  Browne  (1968),  was  used  in  this 
study . 

In  order  to  compute  a  sample  correlation  matrix  R  when  given  the 
population  correlation  matrix  P,  one  uses 

R  =  (Diag  [A])“1/2  A  (Diag  [A])“1/2  and 

A  =  (flT)(fiT)'  where 

P  =  and  the  elements  of  T(lower  triangular)  are  chosen  as 
independently  distributed  variables: 

t.  is  distributed  as  N(0,l)  (i  >  j) 

1 J 

t__  is  distributed  as  Chi  with  (N-i)  degrees  of  freedom 
i<J 

For  convenience  of  calculation,  Q  was  chosen  to  be  lower  triangular  and 
was  obtained  by  the  square  root  method  for  triangular  factoring  (Dwyer,  19^5) • 
Thus,  this  method  requires  only  the  generation  of  - ^  random 
normal  deviates  and  p  (p  =  20,  the  number  of  variables)  random  Chi  variables 
for  each  sample  correlation  matrix,  regardless  of  the  sample  size.  Also  a 
large  amount  of  computational  time  is  saved  in  the  calculation  of  the 


correlation  matrix. 
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In  order  to  generate  the  random  normal  deviates  for  the  T  matrix,  it 

was  first  necessary  to  generate  random  integers  on  the  computer.  These 

integers  were  converted  to  real  numbers  uniformly  distributed  between 

zero  and  one,  and  then  these  were  normalized.  Unfortunately,  there  is  no 

way  to  pick  truly  random  numbers  on  the  computer,  so  the  random  integers 

needed  were  produced  by  a  simple  arithmetic  process.  These  random  integers 

are  often  called  pseudo  random  integers,  because  they  are  produced  by  a 

deterministic  process.  Richardson  (1969)  reviewed  several  methods  of 

generating  pseudo  random  integers  and  chose  the  multiplicative  congruential 

method  as  the  best  for  the  IBM  360,  on  the  basis  of  randomness  (passing 

statistical  tests),  length  of  period  (number  of  integers  generated  before 

the  sequence  repeats  itself),  and  generation  time  needed.  This  method  is 

based  on  the  relation  X.,,  =  aX.(mod  m)  which  means  that  aX.  is  divided  by 

l+l  1  1  J 

S"t 

m  and  the  i+1  random  integer  X^+^  is  set  equal  to  the  remainder.  * 

Muller  (1959)  compared  several  methods  of  generating  pseudo  random 
normal  deviates  from  pseudo  random  numbers  on  the  interval  (0,  l).  The 
direct  approach  (Box  and  Muller,  1958)  was  picked  as  best  because  of  the 
resulting  reliability  in  the  tails  of  the  distribution  and  the  relatively 
greater  accuracy  when  compared  with  other  methods.  The  transformations  are: 

X,  =  (-2  log  U.  )1/2  cos  2ttU0 
1  e  -L  d 

X2  =  (-2  loge  U1)1/2  sin  2ttU2 

*  The  modulus  m  was  set  to  2 *"  in  order  to  provide  the  maximum  possible 
period.  The  constant  a  was  chosen  by  Richardson  from  1500  different 
multipliers,  as  the  one  which  produced  the  integers  with  the  best  statistical 
properties.  Integers  on  the  IBM  360  occupy  32  binary  digits  (bits),  but 
real  numbers  use  only  2h  bits  (the  remaining  8  are  used  for  the  exponent). 
Thus,  the  pseudo  random  integers  were  converted  to  a  uniform  distribution 
by  merely  inserting  the  appropriate  exponent  in  the  first  eight  bits,  so 
that  the  real  numbers  would  lie  between  zero  and  one. 


where  and  are  pseudo  random  numbers  from  the  interval  (0,  l),  and 
and  Xg  are  independent  variables  from  the  normal  distribution  with 
mean  zero  and  unit  variance  [N(o,  l)]. 

In  order  to  determine  the  pseudo  random  Chi  variables  for  the 
diagonals  of  the  T  matrix,  the  following  approximation  was  used 
(Abramowitz  and  Stegun,  1966) 


xp  “  x'1  -  -k  *  (xp  -  V  *  ]3  (v  '  30) 

where  v  =  degrees  of  freedom  and  X^  is  a  pseudo  random  normal  deviate.  The 

value  for  h  is  gotten  from  the  relation  h  =  “  where  h.sn  is  tabled 
v  v  v  60  60 

against  values  of  X  from  -3.5  to  +3.5  by  Abramowitz  and  Stegun.  A  cubic 

P 

equation  was  used  to  interpolate  between  the  tabled  values  of  hgQ. 
tw  =  -.000921+X  -  .000159X2  +  .000308X3  +  .000189 

60  p  p  p 

The  correlation  between  h^Q  and  (for  the  15  tabled  values)  was  1.0000. 

C .  Factor  Analyses 

Each  correlation  matrix  was  factored  using  Joreskog's  ( 1967a) 
maximum-likelihood  factor  analysis  program.  The  maximum  number  of  iterations 
was  set  to  100  and  the  probability  of  chance  occurence  was  set  to  1.0  so 
that  all  solutions  were  obtained.  Solutions  were  obtained  for  the  number 
of  factors  in  the  major  domain.  Additionally,  the  likelihood  ratio  tests 
of  the  number  of  common  factors  were  obtained  from  zero  up  to  the  number  of 
.actors  in  the  major  domain,  so  that  p^  and  p.,  could  be  calculated  for 
each  possible  number  of  factors.  The  coefficients  c^,  Cg,  r^,  and  r ^  were 
calculated  for  each  factor  matrix. 


D.  Analysis  of  Variance 

In  order  to  determine  the  effects  of  the  four  independent  variables 

O 


(model,  level  of  b‘^,  number  of  factors. 


and  number  of  observations)  on 


ORKMdnri 
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the  measures  of  goodness  of  fit,  six  separate  fixed-factor  analyses  of 
variance  were  performed,  each  being  2  x  2  x  2  x  3. 
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III.  RESULTS 


A.  Reliability  Coefficients  p  and 

The  means  of  and  p  ,  across  the  four  samples  of  the  same  size  for 
each  population-correlation  matrix,  are  presented  in  Appendix  A.  There 
were  only  small  differences  between  the  two  coefficients.  For  very  large 
sample  sizes,  the  formulas  yield  quite  similar  results,  as  is  illustrated 
by  the  samples  of  size  1600.  The  correlation  between  p  and  for  the 
number  of  factors  in  the  major  domain,  across  all  96  sample  factorings 
was  .998.  All  results  are  discussed  mainly  in  terms  of  p  ,  as  it  is  the 
later,  published  version. 

was  an  excellent  measure  of  goodness  of  fit  for  the  factor  matrices 
obtained  from  samples  from  the  population-correlation  matrices  {hereafter 
called  sample  factor  matrices)  of  the  formal  model.  In  the  three  factor 

matrices,  with  high  or  low  b^  (Table  3)  0o  was  very  close  to  1.00,  for 

2 

all  sample  sizes.  With  seven  factors  and  high  b^  (Table  b)  the  results 


were  as  good.  However,  with  seven  factors  but  low  b,  t,  0o  went  above  1.00 


2 

V 


after  four  factors  with  only  100  observations.  The  average  value  of  p0 
after  seven  factors  were  obtained  was  1.U2SS,  and  the  individual  values 
were  1.0956,  1.0^28,  1.^710,  and  2.10^6.  This  value,  1.^285,  was  much 
larger  than  the  population  value  of  1.00  obtained  by  Tucker  and  Levis  (19T0). 
While  this  result  was  probably  due  to  the  small  sample  size  of  100,  it 
reflected  an  undesireable  property  for  a  reliability  coefficient.  However, 
with  h(70  and  1600  observations,  results  were  much  better.  Thus,  the  method 
of  maximum  likelihood  resulted  in  good  solutions  as  measured  by  when  the 
populations  exactly  fitted  the  factor  analytic  model.  The  only  exception 
was  with  variables  cf  low  cccssunali ty ,  in  which  case  more  observations  were 
necessary  to  obtain  a  good  fit. 
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Table  3 

Means  of  Pg  for  Matrices  with  3  Factors  in  the  Major  Domain* 
After  3  Factors  Have  Been  Obtained 


Formal  Model  Simulation  Model 

Sample 


size 

High  b?. 

100 

1*00 

1600 

1.0011 

1.0018 

1.0001* 

.8535 

.8253 

.8319 

L0"  blJ 

100 

Uoo 

1600 

.9929 

1.012? 

1.0023 

.6157 

.5690 

.5623 

Table  1* 

Means 

of  pg  for  Matrices  with  7  Factors  in  the  Major 
£  After  7  Factors  Have  Been  Obtained 

Domain , 

Sample 

site 

Fonnal  Model 

Simulation  Model 

High 

100 

1*00 

1600 

1.0139 

l.OQll* 

.9997 

.731*8 

.6829 

.7118 

U»v  b l} 

100 

1*00 

1000 

1.1*285 

1.01*82 

.9981* 

.6022 

.5312 

.1*982 

Table  5 

Means 

of  Og  for  Matrices  with  3  Factors  in  the  Major 
After  1*  Factors  Have  Been  Obtained 

Domain , 

Samp! o 
size 

Formal  Model 

Simulation  Model 

High  b2 

100 

1*00 

K'00 

1.0116 

1.0051* 

1.0011 

.8776 

.81*22 

.81*1*2 

low 

100 

1*00 

l&OO 

1 . 0}*  1 6 
1.0257 
1.O050 

.671*3 

.6029 

.6l0ti 

Results  from  the  simulation  model  matrices  were  not  nearly  as  good. 


In  no  case  did  p ^  reach  the  value  of  1.00.  The  largest  values  were 

2 

obtained  with  high  b  and  three  factors  (Table  3),  but  the  highest  was 

.8535.  There  was  a  trend  for  to  decrease  with  increased  sample  size 

for  simulation  model  matrices  with  low  b^  (Tables  3  and  b) .  also 

became  smaller  as  the  number  of  factors  in  the  major  domain  increased  and 

as  b“T  decreased. 

-**  J 

The  calculation  of  p^  was  extended  to  four  factors  in  the  three  factor 

matrices,  in  order  to  see  how  it  behaved.  It  was  thought  that  there  might 

be  some  leveling  off,  after  three  factors.  This  did  occur  for  the  formal 

model  (Table  5),  after  p^  had  already  reached  1.000.  There  was  some  tend- 

2 

ency  for  the  values  of  P2  to  level  off  for  the  simulation  model,  high  b^ , 
as  the  increase  from  two  factors  to  three  factors  was  much  greater  than 

that  from  three  factors  to  four  factors  (Appendix  A).  In  the  simulation 

2 

model  with  low  b^ ,  there  were  no  signs  of  a  leveling  off  of  P2  after 
three  factors. 


B.  Other  Goodness  of  Fit  Measures 

The  results  for  and  c?  (Table  6)  were  very  similar,  as  were  the 

results  for  r.  and  r_.  To  get  an  idea  of  the  degree  of  similarity,  the 
coefficients  were  correlated  across  all  96  matrices.  Since  correlated 
.968  with  c,,,  and  r^  correlated  .99^  with  rn,  results  will  be  discussed 
in  terms  of  and  r^  only.  The  coefficient  c^  behaved  exactly  as  expected. 
For  all  eight  matrices,  c^  got  smaller  as  the  sample  size  increased.  In 

all  cases,  increasing  the  number  of  factors,  while  holding  model,  level  of 

o 

b"  ,  and  sample  size  fixed,  caused  an  increase  in  c,  .  In  all  cases,  moving 

x  j  J. 

from  the  formal  model  to  the  simulation  model  while  holding  the  other  three 
independent  variables  fixed  caused  an  increase  in  c^.  Finally,  in  all 
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Table  6 

Means  of  the  Coefficients  c^,  c^,  r  ,  and  r^.  After  the  Number 
of  Factors  in  the  Major  Domain  Have  Been  Extracted 


Matrix 

Sample 

size 

C1 

C2 

rl 

r2 

1 

100 

.9587 

.9050 

.9782 

.9737 

1+00 

.21+72 

.2296 

.991*1* 

.9933 

1600 

.01+61* 

.01+26 

.9990 

.9988 

2 

100 

1.6355 

1.1+322 

.91*13 

.9201 

1+00 

.1+625 

.1+336 

.9831* 

•  9758 

i600 

.1035 

.091+3 

•  9963 

.991*7 

3 

100 

1.1+689 

1.3211+ 

.9705 

.9671 

l+oo 

•  5290 

.1+51*7 

.9891+ 

.9887 

l600 

.3523 

.2°  65 

•  9929 

.9929 

1* 

100 

2.5336 

2. 11+20 

.8915 

.81*03 

1+00 

.9252 

.61*90 

.9601* 

.9516 

1600 

.6677 

.1*593 

•  971b 

.9657 

5 

100 

1 . 5298 

1.2331 

.851*1 

.8520 

l+oo 

.3131 

.2732 

.9701 

.9672 

1600 

.071*9 

.0623 

.9929 

.9925 

6 

100 

3.7076 

1.8881 

.1256 

.21*23 

1+00 

.7221 

.  3851* 

.8297 

.81*53 

.1600 

.  . *;15?6 

.1078 

.9392 

...  .9567 

7 

100 

5.0938 

1*  .173? 

•  5351 

.5258 

1*00 

2.9050 

2.0161 

.731*9 

.7709 

1600 

2.31*7!' 

1.7911* 

.7858 

.  ,7961* 

8 

100 

6.7831* 

1*  .1999 

-.5998 

-.6851* 

1*00 

5.1870 

3.0805 

-.2233 

-.2398 

l6no 

1*  .7623 

2.711*3 

-.1231 

-.0803 
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Table  T 

Standard  Deviations  of  the  Coefficients  c  ,  c^,  ,  and  r^.  After  the 

Number  of  Factors  in  the  Major  Domain  Have  Been  Extracted 


Matrix 

Sample 

size 

C1 

C2 

rl 

r2 

1 

100 

.2668 

.2555 

.0061 

.0071* 

Uoo 

.0696 

.0632 

.0016 

.0018 

1600 

.0066 

.0071 

.0001 

.0002 

2 

100 

.1*390 

.5500 

.0177 

.0307 

Uoo 

.1100 

.1057 

.0039 

.0059 

1600 

.0230 

.021*0 

.0008 

.0013 

3 

100 

.9731* 

.9287 

.0196 

.0231 

1*00 

.01*16 

.01*59 

.0008 

.0011 

1600 

.0531* 

.01*57 

.0011 

.0011 

1. 

100 

.1*661 

.2975 

.0200 

.0222 

It  00 

.1811* 

.1938 

.0078 

.011*8 

1.600 

.1701* 

.1382 

.0073 

.0103 

5 

100 

.2350 

.0933 

.022** 

.0112 

It  00 

.1363 

.13.75 

.0130 

.0159 

1600 

.0130 

.0061 

.0012 

.0007 

6 

100 

.21*19 

.197? 

.0571 

.079!* 

It  00 

.1819 

.051*6 

.01*20 

.0219 

1600 

.131*8 

.0111 

.031? 

.oo!*5> 

7 

100 

2 . 3291* 

7. 72 ‘-1 

.21 26 

.2529 

It  00 

.1106 

.0656 

.0101 

.0075 

1.6  oo 

.1*228 

.1?’'C 

.0386 

.0210 

8 

100 

.6732 

.5866 

.1503 

.223j* 

It  00 

,  01 «  0 

.6078 

.1521 

.2799 

1600 

.  .3716 

.1776 

.0876 

.0692 
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cases,  increased  when  the  went  from  high  to  low.  (Note,  there  was 

one  reversal  of  this  last  finding  with  c^,  which  decreased  as  the  level  of 
2 

b^  went  from  high  to  low  from  matrix  2  to  matrix  6.) 

Also,  r^  increased  with  increased  sample  size,  increased  values  of 

2 

b^j ,  fewer  factors,  and  from  the  simulation  model  to  the  formal  model. 

2 

For  high  b  ,  formal  model,  the  values  of  r1  were  good  for  three  factors, 

all  sample  sizes,  while  seven  factors  required  a  sample  size  of  1*00  for  a 

o 

satisfactory  result.  With  the  simulation  model,  low  b~  ,  and  three  factors, 

xj 

r^  reached  only  .796^  with  1600  observations.  In  matrix  8  (simulation 
model,  low  b^,  seven  factors),  the  values  of  were  actually  negative. 

This  was  partly  due  to  the  low  toted  sum  of  squares  in  the  model  correlation 
matrix,  but  this  result  indicated,  much  better  than  did  p  ,  the  inaccuracy 
of  these  solutions. 

The  standard  deviations  of  the  coefficients  c^,  c^»  r^,  and  were 
calculated  (Table  7).  There  was  a  tendency  for  the  standard  deviations 
to  be  smaller  with  better  fit,  but  there  were  more  reversals  than  with 
the  means.  Aasg,  since  r.  and  had  upper  limits  of  1.0,  their  standard 

X 

deviations  were  forced  to  decrease  as  the  means  increased  because  the 
upper  limit  was  being  approached. 


C.  Use  of  the  Likelihood-fiat io  Test 

Joreskog  (1967b)  used  the  likelihood-ratio  technique  to  test  the 

hypothesis  that  the  number  cf  factors  a  was  a  given  number.  The  exact 

distribution  of  the  likelihood -ratio  test  statistic  is  not  known,  but  for 

large  }i  its  distribution  is  approximately  a  distribution  with  degrees 
1  2 

of  freedom  —  ((p  -  a)  -  (p  ♦  «)J.  If  the  hypothesis  of  »  factors  was 

4. 

rejected  (due  to  a  statistically  significant  value  of  the  test  statistic), 
Joreskog  refactored  the  matrix  for  »♦!  factors.  It  was  thought  that  by 


Table  8  \ 

\ 

Range  of  Probability  Levels  for  y2  Statistics 
3  Factor  Matrices 


Matrix 

Sample 

^  2  Factors  \ 

3  Factors 

7  Factors 

'size 

1  Min.  Max.  I 

Min.  Max. 

Min .  Max 

1  \ 

100 

.0000 

.0000 

.1173 

"T  1 1  1  ■  ■■■ 

.8533 

.9890 

.9976 

! 

i 

Uoo 

.0 

,0 

.2691 

.9211 

.9820  1.0000 

1600 

.0 

.0 

■  .2028 

.8257 

.9776 

.9968 

3 

ioo 

.0000 

.\0000 

.0000 

.0000 

.0000 

■.0191+ 

l+oo 

.0 

.b 

.0 

.0 

.0000 

.0000 

1600 

.0 

.0 

.0 

.0 

.0 

.0 

1  1C 

.0066 

.6501+ 

.2060 

.8027 

.9376 

.9960 

too 

.0000 

.0000  1 

.1201 

.9712 

.8897 

.9982 

'\6oo 

.0 

.0 

.5070 

.9186 

.9616 

.9923 

T 

100 

.0000 

.0000 

.0000 

.0000 

.0007 

.0051+ 

It  00. 

.0 

..o 

.0 

.0 

.0000 

.0000 

1600 

.0 

.0 

.0 

.0 

.0 

\ 


\ 

\ 

7  Factor  Matrices 

1+  Factors  6  Factors 

Min.  Max-  Min.  Max.  \ 

7  Factors 
Min.  Max. 

p 

100 

.0000 

.0000 

.0000 

! 

.0683 

.5381 

.81*65 

It  00 

.0 

.0 

.coco 

.0000 

.2903 

<791*8 

1600 

.0 

.0 

.0 

.0 

.201+3 

.7330 

1+ 

100 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

:  00 

.0 

.0 

.0 

.0 

.0 

.0 

<1600 

.0 

.0 

.0 

.0 

.0 

.0 

6 

100 

.0723 

.9560 

.5019 

.9995 

.6065 

.9998 

1)00 

.0000 

.0000 

.0840 

.6060 

.7133 

.96  81+ 

1600 

.0000 

-0000 

.0005 

.0112 

.2279 

.5371 

8 

100 

.0000 

.0000 

l 

.0000 

.0000 

.0000 

.0000 

It  00 

.0 

.0 

.0 

.0000 

.0000 

.0000 

1600 

.0 

-.0 

.0 

.0  1 

.0 

.0 

Note:  In  the  above  table,  the  entry  ,0000  means  that  the  number  was  a  iero, 
when  rounded  to  1+  decimal  places.  The  entry  .0  was  an  exact  zero,  to  the 
accuracy  of  the.  computations  (about  7  decimal  places). 


i 
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looking  at  the  probability  levels  (probabilities  of  the  chance  occurence 

2  1  ■ 

of  the  observed  x  values)  of  the  test  statistic  for  various  numbers  of 

factors,  one  might  be  able  to  determine  the  correct  number  of  factors. 

In  Table  8  are  presented  the  ranges  of  these  probability  levels,  for 
selected  numbers  of  factors.  For  the  formal  model  (matrices  1  and  3  for 

three  factors,  and  matrices  2  and  6  for  seven  factors),  in  all  cases  one 

\ 

1 

would  accept  (at  any  reasonable  probability  level  from  .001  to  .100)  the  , 


.hypothesis  of  the  number  of  factors  in  the  major  domain.  The; probability 


levels  ranged  from  a  low  of  .1173  for  one  high  b  ,  three  factor  matrix 


to  a  high  of  .9998  for  a  low  b^  ,  seven  factor  matrix.  However,  in  one 


case,  with  low  tx  .,  three  factors,  and  a  sample  size  cf  100,  the  hypothesis 
lj  \ 


of  two  factors  was  also  accepted  (p  =  .650^).  With  seven  factors,  and  low 


b^  (matrix  6),  eu  hypothesis  cf  on^Ly  four  factors  was  supported  with  100 


observations  and  hn  hypothesis  of  six  factors  was  supported  with  400 

observations.  ,  ' 

’i  i 

I 

For  all  simulation  model  matrices,  however ,  the  hypothesis  that 

j 

the  niimber  of  factors  was  equal  to  the  number  of  factors  in  the  major 


domain,  was  rejected.  Even  the  hypothesis  of  seven  .factors  for  a  sample 


2  i 

factor  matrix  with  high  b  and  only  three  factors  in  the  major  domain 

-L  J 


was  rejected  (although  one  matrix  of  sample  size  100  did  have  a  p  *  .019^ 


\ 


which  would  not  have  been  rejected  at  the  .01  level).  !Thus,  this  test 


is  appropriate  fo»*  testing  the  hypothesis  that  the  factor  analytic  model 

\  '  \ 

holds  exactly  in  the  data,  but  it  is  of  no  use  as  a  measure  of  goodness 
of  fit  for  data  that  do  not  fit  the  model . 


D .  Analyses'  of  Variance 

Separate  arialysis  of  variance  summary  tables  for  the  six  measures' 
0^,  p0,  c^ ,  c0,  r^ ,  and  r0  are  presented  in  Appendix  B.  These  analyses 


■-'■•r.irvtgvC.’-v' 


i  '  ■  **  -rl‘' i  ^  f  - -.r 


were  performed  in  order  to  discover  the  relative  sizes  of  the  effects  of 


the  four  independent  variables,  level  of  b  ,  model,  number  of  factors  in 
the  major  domain,  and  sample  size.  Since  tue  assumption  of  normality  of 
analysis  of  variance  was  possibly  violated,  especially  with  p^,  p^»  r^,  and 
r?,  border  line  significant  F  ratios  should  not  be  taken  too  seriously. 

The  results  for  and  p?  were  again  very  similar,  so  results  are 
discussed  in  terms  of  p^.  The  main  effect  of  model  accounted  for  61.19% 
of  the  total  sum  of  squares  for  p^.  The  average  value  of  p0  for  the 
formal  model  was  1.0U2,  while  for  the  simulation  model,  it  was  only  .66 8. 
The  only  other  large  contributor  to  the  total  sum  of  squares  (except  for 
within  cell)  was  the  interaction  between  model  and  level  of  b  ,  which 

J.J 


accounted  for  9*07%  of  the  total  sum  of  squares. 

formal  simulation 


high  b 


1J 


low  b 


U 


1.003 

.773 

1.080 

.563 

Four  other  small  but  statistically  significant  effects  were  also 

found.  The  average  value  of  p^  was  .888  for  sample  factor  matrices  with 
2  2 

high  b^j  and  .822  for  those  with  low  b^ .  There  was  also  a  significant 
trend  for  pg  to  decrease  with  increased  sample  size.  The  averages  were 
.905,  .83*+,  and  .826,  for  sample  sizes  100,  *+00,  and  1600,  respectively. 
The  BXF  and  MXF  interactions  were  also  significant  at  the  .01  level. 


3  Factors 

7  Factors 

3  Factors 

7  Factors 

high  b^ 

1 

•  919 

.857 

formal 

1 

1.002 

1.082 

lov 

.792 

.851 

simulation 

.710 

.627 

Although  the  results  were  similar  for  c^  and  Cg,  only  6,68%  of  the 
total  sum  of  squares  was  attributable  to  error  for  c^,  whereas  13.02$  was 
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error  for  •  This  supported  the  earlier  decision  to  discuss  results  in 

terms  of  only,  as  it  was  less  variable,  within  cells.  The  main  effect 

of  level  of  b  accounted  for  25.17#  of  the  total  sum  of  squares.  The 

2  2 

mean  value  of  c  was  .82?  for  high  b  and  2.807  for  low  b  .  Model 

x  X  J  X  J 

accounted  for  2l*.62#  of  the  total  sum  of  squares,  and  the  mean  for  the 
formal  model  was  .838  while  for  the  simulation  model,  it  was  2.796. 

Sample  size  accounted  for  17.37#  of  the  total  sum  of  squares,  with  c^ 
dropping  as  sample  size  increased.  The  means  were  2.96b,  1.1*11,  and 
1.077  for  the  sample  sizes  100,  1*00,  and  l600,  respectively.  The  level 


of  b  by  model  interaction  accounted  for  13.58#  of  the  total  sum  of 

J-J 

squares. 


high  b^ 


low  b. 


1,1 


formal  simulation 


.576 

. 

1.079 

1.101 

>*.513 

The  main  effect  of  numbers  of  factors  in  the  major  domain  (cn  =  1.322  for 
three  factors,  =  2.312  for  seven  factors)  and  the  three  interactions 
shown  below  were  also  significant  at  the  .01  level. 


3  Factors 

7  Factors 

100 

1*00 

1600 

2 

high  b£ 

.600 

1.055 

high  b^. 

1.61*9 

•5U 

.292 

low  b!T 

J-J 

2.011* 

3.570 

low  b" 

***0  | 

l*  .279 

2.282 

- i 

1.861 

formal 

simulation 


3  Factors  7  Factors 


.528 

.1 .11*8 

2.116 

3.1*77 

Unfortunately,  due  to  negative  values  for  low  b^ ,  simulation  model, 
and  7  factors  (summed  across  sample  size),  every  main  effect  and  interaction 


w*  'i  .1  jiv 
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which  did  not  involve  sample  size  was  highly  significant  for  r^  and  rg. 
While  the  negative  values  (r2  *  -.338,  ^  =  -.315)  indicated  how  poor 
results  were  for  that  combination,  the  effect  was  apparently  strong 
enough  to  influence  most  other  effects.  The  main  effect  of  sample  size 
did  account  for  5-lW  of  the  total  sum  of  squares,  with  r  increasing 
as  sample  size  increased.  The  means  .587,  .780,  and  .819  for  100,  400, 
and  1600  observations,  respectively. 


27 


IV.  DISCUSSION 


A  major  goal  of  the  present  study  was  to  find  or  develop  a  measure 
of  goodness  of  fit  for  the  factor  analytic  model.  One  such  measure 


studied  was  p^.  61.19#  of  the  sum  of  squares  of  was  accounted  for 


by  the  main  effect  of  model.  For  samples  from  population-correlation 


matrices  constructed  to  exactly  fit  the  factor  analytic  model,  pg  worked 


exceedingly  well.  Only  in  the  case  of  seven  factors  and  low  b,  was  it 


necessary  to  have  a  sample  size  greater  than  100.  Pg  also  had  the 
desirable  property  of  approaching  unity  (or  nearly  so)  as  the  sample 
size  increased,  for  the  matrices  developed  from  the  formal  model.  The 


samples  from  the  simulation  model  behaved  quite  differently.  Even  in  the 


best  case  (three  factors,  high  b  ) ,  the  average  value  of  Pg  was  .8535. 


Thus,  in  all  cases,  pA  reflected  the  presence  of  the  minor  factors. 


There  was  also  a  significant  decrease  in  pg  for  sample  sizes  ^00  and 


1600,  when  compared  with  100.  This  is  not  a  good  property  for  a  proposed 
measure  of  goodness  of  fit,  as  intuitively  one  would  expect  the  fit  to  a 
good  model  to  improve  with  more  observations.  However,  this  decrease  is 


due  in  part  to  Pg  coming  down  to  1.000,  after  going  over  that  value  for 


samples  of  100.  There  was  a  significant  decrease  in  c^  with  increased 


correlations  implied  by  the  model  better  with  more  observations.  This 


was  further  illustrated  by  the  fact  that  the  pg  values  were  approaching 


the  population  values  obtained  by  Tucker  and  Lewis.  This  can  be  seen  by 
subtracting  the  population  values  (Table  l)  from  the  sample  values 
(Tables  3  and  H). 


nVni,iiiW:i  aft  lii’i 
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Table  9 

Differences  Between  Sample  Values  snd  Population 
Values  of 


Sample  Size 


100 

1+00 

1600 

Hish  blj’ 

Formal  Model,  3  Factors 

7  Factors 

.0011 

.0139 

.0018 

.0011+ 

.0001+ 

-.0003 

High  b c±y 

Simulation  Model,  3  Factors 

7  Factors 

.0235 

.021+8 

-.001+7 

-.0271 

.0019 

.0018 

Low  b^j , 

Formal  Model,  3  Factors 

7  Factors 

-.0071 

.1+285 

.0127 

.01(82 

.0023 

-.0016 

L°W 

Simulation  Model,  3  Factors 

7  Factors 

.0657 

.1222 

.0190 

.0512 

.0123 

.0182 

Since  these  values  (Table  9)  are  only  accurate  to  two  decimal  places  (as 

2 

the  Tucker  and  Lewis  figures  are  to  two  places),  all  except  the  low  b^ , 
simulation  model  matrices  were  within  rounding  error  of  the  population 
values  for  samples  of  size  1600.  Thus,  the  decreases  in  p0  with 
increasing  sample  sizes  were  toward  the  population  values. 


An  important  result  was  pointed  out  by  the  significant  interactions 


between  level  of  b^  and  the  model  for  p^  and  e^.  In  both  cases,  the 

2 

results  in  the  simulation  model,  low  b  cell  were  much  poorer  than  would 
have  been  predicted  from  the  main  effects  alone.  These  results,  an  average 


p,.  of  .563  and  an  average  c,  of  1+.513  (over  four  times  greater  than  the 
next  largest  cell),  showed  that  one  cannot  expect  to  support  one's  hypothesis 


with  variables  that  have  low  percentage  of  variance  accounted  for  in  the 


major  factors.  It  was  interesting  to  note  that  while  the  values  of  c^  were 

?.  2 

about  the  same  in  the  two  cells  high  b^ ,  simulation  model  and  low  b^ , 
formal  model,  the  former  had  p^  =  .773  and  the  latter  had  p0  =  1.080. 


Thus  the  model  correlations  were  reproduced  as  well  for  simulation  model, 


high  b!T  as  for  formal  model,  low  b!T,, 
J-  J  0 
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The  fact  that  level  of  b-^  (25*17$)  accounted  for  as  large  a  per¬ 
centage  of  the  total  sum  of  squares  of  c^  as  did  jpodel  (2k. 62%)  was 
encouraging.  Thus,  if  the  simulation  model  is  a  better  model  of  the 
world,  it  is  still  possible  for  an  experimenter  to  improve  his  results 
by  constructing  measures  with  high  proportions  of  variance  accounted  for 

by  the  major  factors. 

2 

The  X  statistic  was  useful  for  sample  factor  matrices  for  the 

formal  model  only.  Even  then,  it  lead  to  the  acceptance  of  too  few  factors 

2 

in  some  cases,  with  low  b  and/or  too  few  observations. 

**■  J 

The  measure  r  did  well  for  the  formal  model  matrices,  although 

more  observations  were  necessary  before  it  neared  its  maximum  of  1.00. 

2 

Also,  with  seven  factors,  low  K  and  1600  observations  it  only  attainted 

J 

2 

.9392.  However,  the  simulation  model  matrices  with  high  b. .  also  gave 

-i-J 

high  values  of  r  .  Thus  the  maximum-likelihood  estimation  procedure  was 
doing  a  good  job  of  reproducing  the  model  correlations,  but  this  was  not 

2 

reflected  in  p^.  The  results  on  matrices  7  and  8  (simulation  model,  low  b^j ) 
confirmed  the  importance  of  controlling  the  relationship  between  the  major 
and  minor  influences  on  one's  results.  The  major  factors  should  predominate 
over  minor  factors  in  any  study.  The  results  did  indicate  that  it  is 
easier  to  reproduce  a  small  number  of  factors  in  a  poorly  designed  study. 

Thus  r^  was  shown  to  be  useful  as  a  measure  of  goodness  of  fit.  It 
does  require  the  writing  down  of  an  hypothesized  factor  matrix  <J>,  so  it 
can  not  normally  be  used  in  exploratory  studies.  It  has  the  advantage 
that  it  could  be  used  with  any  factor  analytic  procedure.  On  the  other 
hand,  could  be  used  with  any  study,  as  long  as  the  maximum-likelihood 
estimation  procedure  is  used.  However,  it  seems  somewhat  less  useful  than 


r^  when  the  factor  analytic  model  does  not  hold  in  the  population. 
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An  investigator  should  strive  to  develop  variables  which  strongly 

represent  his  major  factors.  He  should  have  a  large  sample  size  (a  ratio 

of  five  observations  per  variable  was  not  always  sufficient  to  insure 

good  results,  even  with  high  b  and  the  formal  model).  If  he  uses 

maximum-likelihood  factor  analysis,  he  should  usually  stop  factoring 
2 

when  the  X  becomes  non-significant,  if  his  sample  size  is  large  enough. 

However,  with  real  data,  there  may  be  statistically  significant  minor 

2 

factors  which  are  not  of  interest.  In  this  case  the  x  cannot  give  an 
indication  of  when  to  stop  factoring.  However,  the  investigator  can  use 
as  an  estimate  of  how  well  the  formal  factor  analytic  model  holds  in 
the  population  from  which  his  data  was  taken.  Although  the  statistical 
properties  of  this  estimate  are  not  known,  and  it  may  be  high  for  3mall 
samples,  it  does  have  the  desirable  property  of  having  a  value  of  1.00 
in  populations  which  exactly  fit  the  formal  factor  analytic  model. 

Small  values  of  p0  probably  indicate  a  poorly  controlled  study,  and 
the  investigator  may  be  able  to  improve  his  results  by  using  better 
controls  over  minor  factors,  by  having  variables  with  high  percents  of 
variance  in  the  major  domain,  and  by  having  a  higher  ratio  of  variables 
to  major  factors.  Finally,  in  a  confirmatory  study  he  should  write 
down  an  hypothesized  factor  matrix  and  use  and  r^  to  determine  its 
goodness  of  fit. 
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MEANS  OF  AND  p 
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Table  10 


Means  of  for  3  Factor  Matrices 


Matrix 

Sample 

size 

1  Factor 

2  Factors 

3  Factors 

U  Factors 

1 

100 

■M 

■HM| 

1.0011 

1.0135 

4oo 

■Ml 

1.0017 

1.005U 

l600 

HEHfiBK 

1.000U 

1.0011 

3 

100 

.UT76 

.66  26 

.8570 

.8818 

Uoo 

.U6U8 

.6U58 

.8262 

.8U32 

1600 

.U679 

.6522 

.8321 

.8UUU 

5 

100 

.6739 

.8879 

•  9932 

1.0397 

Uoo 

.  658U 

.8639 

1.0126 

1.0255 

1600 

.6581 

.86U1 

1.0023 

1.0050 

T 

100 

.U719 

.5591 

.626U 

.6863 

Uoo 

.UU13 

.5205 

.571U 

.6058 

1600 

_ •i*577_ 

.5177 _ 

.5629 

.6112 

Table  1.1 


Keans  of  0,  for  7  Factor  Matrices 
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